Towards a general and extensible phrase-extraction algorithm
نویسندگان
چکیده
Phrase-based systems deeply depend on the quality of their phrase tables and therefore, the process of phrase extraction is always a fundamental step. In this paper we present a general and extensible phrase extraction algorithm, where we have highlighted several control points. The instantiation of these control points allows the simulation of previous approaches, as in each one of these points different strategies/heuristics can be tested. We show how previous approaches fit in this algorithm, compare several of them and, in addition, we propose alternative heuristics, showing their impact on the final translation results. Considering two different test scenarios from the IWSLT 2010 competition (BTEC, Fr-En and DIALOG, Cn-En), we have obtained an improvement in the results of 2.4 and 2.8 BLEU points, respectively.
منابع مشابه
Extraction Programs: A Unified Approach to Translation Rule Extraction
We provide a general algorithmic schema for translation rule extraction and show that several popular extraction methods (including phrase pair extraction, hierarchical phrase pair extraction, and GHKM extraction) can be viewed as specific instances of this schema. This work is primarily intended as a survey of the dominant extraction paradigms, in which we make explicit the close relationship ...
متن کاملارائۀ راهکاری قاعدهمند جهت تبدیل خودکار درخت تجزیۀ نحوی وابستگی به درخت تجزیۀ نحوی ساختسازهای برای زبان فارسی
In this paper, an automatic method in converting a dependency parse tree into an equivalent phrase structure one, is introduced for the Persian language. In first step, a rule-based algorithm was designed. Then, Persian specific dependency-to-phrase structure conversion rules merged to the algorithm. Subsequently, the Persian dependency treebank with about 30,000 sentences was used as an input ...
متن کاملScalable Variational Inference for Extracting Hierarchical Phrase-based Translation Rules
We present a Variational-Bayes model for learning rules for the Hierarchical phrasebased model directly from the phrasal alignments. Our model is an alternative to heuristic rule extraction in hierarchical phrase-based translation (Chiang, 2007), which uniformly distributes the probability mass to the extracted rules locally. In contrast, in our approach the probability assigned to a rule is gl...
متن کاملA Generalized Alignment-Free Phrase Extraction
In this paper, we present a phrase extraction algorithm using a translation lexicon, a fertility model, and a simple distortion model. Except these models, we do not need explicit word alignments for phrase extraction. For each phrase pair (a block), a bilingual lexicon based score is computed to estimate the translation quality between the source and target phrase pairs; a fertility score is c...
متن کاملتولید درخت بانک سازهای زبان فارسی به روش تبدیل خودکار
Treebanks is one of important and useful resource in Natural Language Processing tasks. Dependency and phrase structures are two famous kinds of treebanks. There have already made many efforts to convert dependency structure to phrase structure. In this paper we study an approach to convert dependency structure to phrase structure because of lack of a big phrase structure Treebank in Persian. A...
متن کامل